Speech-to-Text API

This page outlines the fundamentals of using the Speech-to-Text API. Covered in this page is information on the types of requests you can make using Speech-to-Text, how to construct those requests, and how to handle their responses. It's recommended that you read this page in its entirety before diving into the Speech API.

Speech Requests

Speech-to-Text has two main methods of performing speech recognition. These are listed and described as follows:

Synchronous Requests

With synchronous requests (REST), audio data is sent to the Speech-to-Text API, recognition is performed on that data, and results are returned once all audio has been processed. Synchronous recognition requests are limited to audio data of 1 minute or less in duration.

Request Type	Audio Length Limit
Synchronous Request	`≤ 60 seconds`
Asynchronous Request	`≤ 400 minutes`

Supported formats

File Type - We currently only support wav, amr, flac, and ogg. audio files.
Sample Rate - We support all sample rates between 8 000 Hz and 48 000 Hz. If you can choose the sample rate of the source, record the audio at 16 000 Hz. This is because sample rates below that might affect the accuracy of our models, and sample rates above 16 000 Hz have no significant impact on the accuracy of our models.

Speech-to-Text API

Synchronous Request

Synchronous recognition requests are the simplest means of performing recognition on speech audio data. The Speech-to-Text API can process up to 1 minute of speech audio data sent in a synchronous request. After the Speech-to-Text API processes and recognizes all of the audio, it returns a response. A sample request is shown in the section that follows:

Endpoint: `/asr`

https://api.botlhale.xyz/asr

tip

You need to include an Authentication Token in request headers. See the Authentication page of this documentation for information on how to generate authentication token codes.

Method: POST

This endpoint processes speech files for Automatic Speech Recognition (ASR). It transcribes spoken language into text and returns the transcription. The audio file is also temporarily stored and uploaded to an S3 bucket, with the S3 file name included in the response.

Authentication

A valid Bearer token must be included in the request headers for authentication.

Headers:

Authorization: Bearer <your_token>

Form Arguments

Request Params	File Type		Description
speech_file	`File`	Required	The audio file to be transcribed.
redact	`bool`	Optional	The sample rate of the supplied audio clip in hertz, for example, 8kHz rendered as 8 000.
language_code	`String`	Optional	The language code of the spoken language in the audio file. If not provided, automatic language detection will be attempted.

Response body

The API returns a JSON object with the following structure:

Unset
{
    "transcription": "Hello, how can I assist you?",
    "s3_filename": "uploads/audio_123456.wav",
    "date_received": "2025-01-28T10:00:00Z"
}

Fields:

Request Params	Data Type	Description
transcription	`string`	The transcribed text from the speech file.
s3_filename	`string`	The name of the uploaded file in the S3 bucket.
date_received	`string`	The timestamp when the request was processed, in ISO 8601 format.

Speech to Text endpoints (async)

Asynchronous Request

Asynchronous recognition requests are another means of performing recognition on speech audio data. This request type requires you to first upload the audio file to our server for the asynchronous process to start. The asynchronous request initiates an asynchronous operation and returns this operation immediately. Asynchronous speech recognition can be used for audio data with a length of up to 400 minutes.

Endpoint `/asr/async/upload`

https://api.botlhale.xyz/asr/async/upload

tip

You need to include an Authentication Token in request headers. See the Authentication page of this documentation for information on how to generate authentication token codes.

Method: POST

This endpoint generates a presigned URL that allows users to upload a speech file for asynchronous Automatic Speech Recognition (ASR) processing. Once the file is uploaded, it will be processed asynchronously, and a notification can be sent to a specified URL when the transcription is complete.

Authentication

A valid Bearer token must be included in the request headers.

Headers:

Authorization: Bearer <your_token>

Form Arguments

Request Params	File Type		Description
org_id	`String`	Required	The unique identifier for the organization making the request.
language_code	`string`	Optional	The language spoken in the supplied audio clip. If not provided, the language will be auto-detected.
sample_rate	`integer`	Optional,default: 16000	The sample rate of the supplied audio clip in hertz.
diarization	`Boolean`	Optional,default: False	Whether to use speaker diarization to differentiate between multiple speakers.
voice_id	`String`	Optional	A unique identifier for the speaker, if applicable.
notify_url	`String`	Optional	A URL to notify once the ASR processing is complete.

Request Body

Unset
{
    "upload_url": "https://s3-bucket-url.com/presigned-upload-link",
    "fields": {
        "key": "asr_uploads/audio_123456.wav",
        "AWSAccessKeyId": "AKIA...",
        "policy": "base64-encoded-policy",
        "signature": "signature-string"
    },
    "expires_in": 3600
}

Upload via Presigned URL

The generated presigned URL includes both a URL and additional fields that must be passed as part of the subsequent HTTP POST request. The following code demonstrates how to use the requests package with a presigned POST URL to perform a POST request for file upload.

Form Arguments

Request Params	File Type		Description
policy	`String`	Required	eyJleHBpcmF0aW9uIjogIjIwMjUtMDItMjFUMDc6MTc6MzRa....
x-amz-algorithm	`string`	Required	AWS4-HMAC-SHA256.
x-amz-credential	`string`	Required,default: 16000	ASIA2ADMPV7EBIIIA3UR/20250221/eu-west-1/s3/aws4_request.
x-amz-date	`string`	Required,default: False	20250221T061734Z.
x-amz-security-token	`String`	Required	IQoJb3JpZ2luX2VjEKf//////////wEaCWV1LXdlc3QtMSJHMEUC...
x-amz-signature	`String`	Required	e3cca032a465e57b837b5d2...
file	`file`	Required	A URL to notify once the ASR processing is complete.

Response Body

json
1

Python
NodeJs - Request

import requests

url = "{{uploadUrl}}"

payload = {'AWSAccessKeyId': '{{fields-AWSAccessKeyId}}',
'key': '{{fields-key}}',
'policy': '{{fields-policy}}',
'signature': '{{fields-signature}}',
'x-amz-security-token': '{{fields-x-amz-security-token}}'}
files=[
  ('file',('tts_aw215n3s4ni4_IsiZulu_H127Bqf8aN08.wav',open('KpALthHva/tts_aw215n3s4ni4_IsiZulu_H127Bqf8aN08.wav','rb'),'audio/wav'))
]
headers = {}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.text)

var request = require('request');
var fs = require('fs');
var options = {
  'method': 'POST',
  'url': '{{uploadUrl}}',
  'headers': {
  },
  formData: {
    'AWSAccessKeyId': '{{fields-AWSAccessKeyId}}',
    'key': '{{fields-key}}',
    'policy': '{{fields-policy}}',
    'signature': '{{fields-signature}}',
    'x-amz-security-token': '{{fields-x-amz-security-token}}',
    'file': [
      fs.createReadStream('KpALthHva/tts_aw215n3s4ni4_IsiZulu_H127Bqf8aN08.wav')
    ]
  }
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});

Endpoint: `/asr/async/status`

https://api.botlhale.xyz/asr/async/status

tip

You need to include an Authentication Token in request headers. See the Authentication page of this documentation for information on how to generate authentication token codes.

This endpoint returns the status of the asynchronous request process.

Request Params	Data Type		Description
OrgID	`String`	Required	Organisation ID
FileName	`Text`	Required	The filename generated from the async upload process.

Request Example

Python
Bash
JavaScript
Node JS - Request

import requests

url = "https://api.botlhale.xyz/asr/async/status?OrgID=<OrgID>&FileName=<filename>"

payload={

  }

files=[

]

headers = {
  'Authorization': 'Bearer <IdToken>'
}

response = requests.request("GET", url, headers=headers, data=payload, files=files)

print(response.json())

curl --location --request GET 'https://api.botlhale.xyz/asr/async/status?OrgID=<OrgID>&FileName=<filename>' \
--header 'Authorization: Bearer <IdToken>' 

var myHeaders = new Headers();
myHeaders.append("Authorization", "Bearer <IdToken>");

var formdata = new FormData();

var requestOptions = {
  method: 'GET',
  headers: myHeaders,
  body: formdata,
  redirect: 'follow'
};

fetch("https://api.botlhale.xyz/asr/async/status?OrgID=<OrgID>&FileName=<filename>", requestOptions)
  .then(response => response.text())
  .then(result => console.log(result))
  .catch(error => console.log('error', error));

var request = require('request');
var options = {
  'method': 'GET',
  'url': 'https://api.botlhale.xyz/asr/async/status?OrgID=<OrgID>&FileName=<filename>',
  'headers': {
    'Authorization': 'Bearer <IdToken>'
  },
  formData: {

  }
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});

Response body

{
    "data": [
        {
            "OrgID": "<OrgID>",
            "id": 207891841473145364,
            "process": "<filename>.wav",
            "processTime": "processTime",
            "status": "running"
        }
    ]
}

ASR Async get data GET

https://api.botlhale.xyz/asr/async/data

tip

You need to include an Authentication Token in request headers. See the Authentication page of this documentation for information on how to generate authentication token codes.

This endpoint returns the status of the async process.

Request Params	Data Type		Description
OrgID	`String`	Required	Organisation ID
FileName	`Text`	Required	The filename generated from the async upload process

Request Example

Python
Bash
JavaScript
Node JS - Request

import requests

url = "https://api.botlhale.xyz/asr/async/getdata?OrgID=<OrgID>&FileName=<filename>"

payload={

  }

files=[

]

headers = {
  'Authorization': 'Bearer <IdToken>'
}

response = requests.request("GET", url, headers=headers, data=payload, files=files)

print(response.json())

curl --location --request GET 'https://api.botlhale.xyz/asr/async/getdata?OrgID=<OrgID>&FileName=<filename>' \
--header 'Authorization: Bearer <IdToken>' 

var myHeaders = new Headers();
myHeaders.append("Authorization", "Bearer <IdToken>");

var formdata = new FormData();

var requestOptions = {
  method: 'GET',
  headers: myHeaders,
  body: formdata,
  redirect: 'follow'
};

fetch("https://api.botlhale.xyz/asr/async/getdata?OrgID=<OrgID>&FileName=<filename>", requestOptions)
  .then(response => response.text())
  .then(result => console.log(result))
  .catch(error => console.log('error', error));

var request = require('request');
var options = {
  'method': 'GET',
  'url': 'https://api.botlhale.xyz/asr/async/getdata?OrgID=<OrgID>&FileName=<filename>',
  'headers': {
    'Authorization': 'Bearer <IdToken>'
  },
  formData: {

  }
};
request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});

Response body

{
    "audio_length": "30.0",
    "filename": "/<filename>.wav",
    "status": "complete",
    "time": {
        "diarization": 6.815945625305176,
        "recognition": 4.098539113998413
    },
    "timestamps": [
        {
            "end": 1260.0000000000005,
            "filename": "1_speaker_0_660.0000000000003_1260.0000000000005_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 660.0000000000003,
            "transcription": "<transcription>"
        },
        {
            "end": 2310.0000000000014,
            "filename": "2_speaker_1_1260.000000000001_2310.0000000000014_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 1260.000000000001,
            "transcription": "<transcription>"
        },
        {
            "end": 2699.9999999999995,
            "filename": "3_speaker_0_2309.9999999999995_2699.9999999999995_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 2309.9999999999995,
            "transcription": "<transcription>"
        },
        {
            "end": 6359.999999999998,
            "filename": "4_speaker_1_2699.9999999999973_6359.999999999998_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 2699.9999999999973,
            "transcription": "<transcription>"
        },
        {
            "end": 6780.000000000008,
            "filename": "5_speaker_0_6360.000000000008_6780.000000000008_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 6360.000000000008,
            "transcription": "<transcription>"
        },
        {
            "end": 7860.000000000012,
            "filename": "6_speaker_1_6780.000000000012_7860.000000000012_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 6780.000000000012,
            "transcription": "<transcription>"
        },
        {
            "end": 8580.000000000022,
            "filename": "7_speaker_0_7860.000000000021_8580.000000000022_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 7860.000000000021,
            "transcription": "<transcription>"
        },
        {
            "end": 13950.000000000011,
            "filename": "8_speaker_1_8580.00000000001_13950.000000000011_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 8580.00000000001,
            "transcription": "<transcription>"
        },
        {
            "end": 15239.999999999889,
            "filename": "9_speaker_1_14249.999999999887_15239.999999999889_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 14249.999999999887,
            "transcription": "<transcription>"
        },
        {
            "end": 15929.999999999867,
            "filename": "10_speaker_0_15239.999999999867_15929.999999999867_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 15239.999999999867,
            "transcription": "<transcription>"
        },
        {
            "end": 18629.999999999854,
            "filename": "11_speaker_1_15929.999999999853_18629.999999999854_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 15929.999999999853,
            "transcription": "<transcription>"
        },
        {
            "end": 19739.99999999995,
            "filename": "12_speaker_0_18629.99999999995_19739.99999999995_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 18629.99999999995,
            "transcription": "<transcription>"
        },
        {
            "end": 21839.999999999993,
            "filename": "13_speaker_1_19739.999999999993_21839.999999999993_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 19739.999999999993,
            "transcription": "<transcription>"
        },
        {
            "end": 22410.000000000073,
            "filename": "14_speaker_0_21840.00000000007_22410.000000000073_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 21840.00000000007,
            "transcription": "<transcription>"
        },
        {
            "end": 24360.00000000009,
            "filename": "15_speaker_1_22410.00000000009_24360.00000000009_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 22410.00000000009,
            "transcription": "<transcription>"
        },
        {
            "end": 25590.000000000167,
            "filename": "16_speaker_0_24360.000000000167_25590.000000000167_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 24360.000000000167,
            "transcription": "<transcription>"
        },
        {
            "end": 26430.000000000215,
            "filename": "17_speaker_1_25590.000000000215_26430.000000000215_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 25590.000000000215,
            "transcription": "<transcription>"
        },
        {
            "end": 28380.000000000244,
            "filename": "18_speaker_0_26430.000000000244_28380.000000000244_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 26430.000000000244,
            "transcription": "<transcription>"
        },
        {
            "end": 29220.00000000032,
            "filename": "19_speaker_1_28380.00000000032_29220.00000000032_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_1",
            "start": 28380.00000000032,
            "transcription": "transcription"
        },
        {
            "end": 30000.000000000353,
            "filename": "20_speaker_0_29220.00000000035_30000.000000000353_nso-ZA.wav",
            "language": "nso-ZA",
            "speaker": "speaker_0",
            "start": 29220.00000000035,
            "transcription": "<transcription>"
        }
    ]
}

Speech to Text endpoints (async)

Endpoint: `/asr/async/upload`

Method: POST

Authentication

A valid Bearer token must be included in the request headers.

Headers:

Authorization: Bearer <your_token>

Form Arguments

Request Params	Data Type	Required	Description
org_id	string	Optional	The unique identifier for the organization making the request.
language_code	string	Optional	The language spoken in the supplied audio clip. If not provided, the language will be auto-detected.
sample_rate	integer	Optional, default: 16000	The sample rate of the supplied audio clip in hertz.
diarization	bool	Optional,default: False	Whether to use speaker diarization to differentiate between multiple speakers.
voice_id	string	Optional	A unique identifier for the speaker, if applicable.
notify_url	string	Optional	A URL to notify once the ASR processing is complete.

Response

The API returns a JSON object containing a presigned URL and the required fields for uploading the audio file.

Example Response:

Unset
{
    "upload_url": "https://s3-bucket-url.com/presigned-upload-link",
    "fields": {
        "key": "asr_uploads/audio_123456.wav",
        "AWSAccessKeyId": "AKIA...",
        "policy": "base64-encoded-policy",
        "signature": "signature-string"
    },
    "expires_in": 3600
}

Response Fields:

Request Params	Data Type	Description
upload_url	`string`	The presigned S3 URL where the speech file should be uploaded
fields	`dictionary`	Contains additional parameters required for the file upload, including authentication credentials.
expires_in	`integer`, default: 3600	The number of seconds before the presigned URL expires (1 hour).

Endpoint: `/asr/async/status`

Method: GET

This endpoint retrieves the status of an asynchronous ASR (Automatic Speech Recognition) process and returns the results if the process is completed.

Authentication

A valid Bearer token must be included in the request headers.

Headers:

Authorization: Bearer <your_token>

Form Arguments

Request Params	Data Type	Required	Description
org_id	string	Required	The organization ID associated with the request.
filename	string	Required	The filename generated during the async ASR upload process.

Response

The API returns a JSON object containing the status of the process and the results if available.

Example Response (Running ):

Unset
{
    "status": "running",
    "location": "location",
    "inference_id": "org_98765"
}

Example Response (Completed):

Unset
{
    "status": "completed",
    "location": "location",
    "inference_id": "org_98765"
}

Endpoint: /asr/async/data

Method: GET

This endpoint retrieves the status and detailed results of an asynchronous ASR (Automatic Speech Recognition) process, including transcription, speaker diarization, timestamps, and redacted versions of speech segments.

Authentication

A valid Bearer token must be included in the request headers.

Headers:

Authorization: Bearer <your_token>

Form Arguments

Request Params	Data Type	Required	Description
org_id	string	Required	The organization ID associated with the request.
filename	string	Required	The filename generated during the async ASR upload process. The format should be `OrgID/filename.wav`

Response

The API returns a JSON object containing metadata about the processed audio and detailed transcription data.

Example Response (Completed)

Unset
{
    "audio_length": 293.52,
    "date_received": "29/01/2025 23:04:28",
    "filename": "asr_618Ilr3ux6b7__16000_BotlhaleAI999__True_https:******botlhaleai**free**beeceptor**com_3M50XY73w6LS29012025_230351",
    "time": {
        "diarization": 11.390989780426025,
        "recognition": 84.02089238166809
    },
    "timestamps": [
        {
            "emotion": "neu",
            "end": 4083.191850594227,
            "filename": "0_SPEAKER_00_1112.0543293718165_4083.191850594227.wav",
            "language": "English",
            "redaction": "Hello, good day. You're speaking to <PERSON> from <LOCATION>. How are you doing today?",
            "speaker": "SPEAKER_00",
            "start": 1112.0543293718165,
            "times": {
                "asr": 1.1830341815948486,
                "emotion": 4.76837158203125e-07,
                "red": 0.024340391159057617,
                "sli": 0.01774907112121582
            },
            "transcription": "Hello, good day. You're speaking to Nick from Kuru. How are you doing today?",
            "transcription_no_LM": "",
            "translation": "-"
        },
        {
            "emotion": "neu",
            "end": 292979.6264855688,
            "filename": "80_SPEAKER_01_291281.83361629886_292979.6264855688.wav",
            "language": "English",
            "redaction": "Thank you.",
            "speaker": "SPEAKER_01",
            "start": 291281.83361629886,
            "times": {
                "asr": 0.9126956462860107,
                "emotion": 4.76837158203125e-07,
                "red": 0.020905494689941406,
                "sli": 0.01774907112121582
            },
            "transcription": "Thank you.",
            "transcription_no_LM": "",
            "translation": "-"
        }
    ]
}

Response Fields

General Metadata

Request Params	Data Type	Description
audio_length	float	The total duration of the audio file in seconds.
date_received	string	The date and time when the request was received.
filename	string	The filename associated with the ASR request.
time	dictionary	Processing time details: diarization (float) – Time spent on speaker diarization, and recognition (float) – Time spent on speech recognition.

Timestamps (List of Speech Segments)

Each object in the timestamps list represents a spoken segment with the following details:

Request Params	Data Type	Description
start	float	Start time (milliseconds).
end	float	End time (milliseconds).
speaker	string	Identified speaker ID (SPEAKER_00, SPEAKER_01, etc.).
filename	string	The audio snippet filename corresponding to this segment.
language	string	The detected language of the speech.
emotion	string	The predicted emotion (e.g., "neu" for neutral).
redaction	string	The redacted version of the speech, replacing sensitive data `(<PERSON>, <LOCATION>).`
transcription	string	The full transcript of the spoken segment.
transcription_no_LM	string	A version of the transcript without language model post-processing.
translation	string	Translation of the speech (if applicable).

Processing Time per Segment

Request Params	Data Type	Description
asr	float	Time taken for automatic speech recognition.
emotion	float	Time taken for emotion detection.
red	float	Time taken for redaction processing.
speaker	float	Speaker label identification time.

Contact us

info

We are here to help! Please contact us with any questions.

Speech-to-Text API

Synchronous Requests​

Speech-to-Text API

Synchronous Request​

Endpoint: /asr​

Authentication​

Form Arguments​

Speech to Text endpoints (async)​

Asynchronous Request​

Endpoint /asr/async/upload​

Authentication​

Form Arguments​

Response Body​

Endpoint: /asr/async/status​

ASR Async get data GET​

Speech to Text endpoints (async)

Endpoint: /asr/async/upload​

Authentication​

Response​

Endpoint: /asr/async/status​

Authentication​

Response​

Endpoint: /asr/async/data​

Authentication​

Form Arguments​

Response​

Response Fields​

Timestamps (List of Speech Segments)​

Processing Time per Segment​

Contact us​

Synchronous Requests

Synchronous Request

Endpoint: `/asr`

Authentication

Form Arguments

Speech to Text endpoints (async)

Asynchronous Request

Endpoint `/asr/async/upload`

Authentication

Form Arguments

Response Body

Endpoint: `/asr/async/status`

ASR Async get data GET

Endpoint: `/asr/async/upload`

Authentication

Response

Endpoint: `/asr/async/status`

Authentication

Response

Endpoint: /asr/async/data

Authentication

Form Arguments

Response

Response Fields

Timestamps (List of Speech Segments)

Processing Time per Segment

Contact us